In [1]:
import metatab as mt
import pandas as pd

p = mt.open_package('..')
p


Out[1]:

Salaries for Los Angeles County 2011 to 2015

transparentcalifornia.com-los_angeles_salaries-2011_2015-lac-1

Salaries for Los Angeles County employees from 2011 to 2015

Documentation

2011–2015 salaries for Los Angeles County Web-based table search for all salary records

Contacts

Origin:

Wrangler: Eric Busboom Civic Knowledge

Resources

  1. salaries-2011 - http://transparentcalifornia.com/export/los-angeles-county-2011.csv Los Angeles County employee salaries 2011

  2. salaries-2012 - http://transparentcalifornia.com/export/los-angeles-county-2012.csv Los Angeles County employee salaries 2012

  3. salaries-2013 - http://transparentcalifornia.com/export/los-angeles-county-2013.csv Los Angeles County employee salaries 2013

  4. salaries-2014 - http://transparentcalifornia.com/export/los-angeles-county-2014.csv Los Angeles County employee salaries 2014

  5. salaries-2015 - http://transparentcalifornia.com/export/los-angeles-county-2015.csv Los Angeles County employee salaries 2015


In [2]:
dfs = [ r.dataframe() for r in p.resources()]

In [5]:
df = pd.concat(dfs)

In [8]:
df.iloc[:5].T


Out[8]:
0 1 2 3 4
agency NaN NaN NaN NaN NaN
base_pay 257993 393137 329008 346455 436447
benefits NaN NaN NaN NaN NaN
employee_name LARRY L WALDIE GAIL ANDERSON JR WING-FAI KWAN DAVID A GOLDSTEIN JOHN P GRUEN
job_title UNDERSHERIFF, UNCLASSIFIED MEDICAL DIRECTOR II PHYSICIAN SPECIALIST, MF CHIEF PHYSICIAN I CHIEF PHYSICIAN III
jurisdiction_name Los Angeles County Los Angeles County Los Angeles County Los Angeles County Los Angeles County
notes None None None None None
other_pay 317470 75264 28103.5 104211 51264.1
overtime_pay 0 0 113766 0 0
status NaN NaN NaN NaN NaN
total_benefits 105898 86816.6 82352.3 98005.8 53105.4
total_pay 575463 468401 470878 450666 487711
total_pay_benefits 681361 555218 553230 548672 540817
year 2011 2011 2011 2011 2011

In [11]:
df.year.value_counts()


Out[11]:
2015    104176
2014    103387
2012    100520
2011     99935
2013     99596
Name: year, dtype: int64

Name Matching

Because a large portion of the LAC employees are hispanic, we must have an employee id other than name of match records; Hispanics, and particularly Mexican-Americans, have a resricted range of both given and family names that makes matching on names very ambiguous.


In [17]:
df.employee_name.value_counts().head(50)


Out[17]:
Not provided           2725
MARIA RODRIGUEZ          33
MARIA RAMIREZ            32
MARIA HERNANDEZ          32
MARIA LOPEZ              30
MARIA GARCIA             29
MARIA PEREZ              28
JOSE GONZALEZ            28
PATRICIA RODRIGUEZ       27
MARIA GONZALEZ           27
PATRICIA GONZALEZ        27
JOSE GARCIA              27
DANIEL RODRIGUEZ         26
MARIA SANCHEZ            26
JOSE MARTINEZ            24
CARLOS PEREZ             23
DAVID LOPEZ              23
GABRIELA HERNANDEZ       23
MONICA LOPEZ             23
ELIZABETH GARCIA         23
VERONICA GARCIA          22
ELIZABETH HERNANDEZ      22
JOSE HERNANDEZ           22
LETICIA GARCIA           21
ELIZABETH PEREZ          21
VERONICA HERNANDEZ       19
ELIZABETH GONZALEZ       19
JUAN RODRIGUEZ           19
SANDRA GARCIA            19
MONICA RODRIGUEZ         19
ARMANDO MARTINEZ         18
MARTHA HERNANDEZ         18
MARIA TORRES             18
MARIA G RODRIGUEZ        18
MARIA MARTINEZ           18
MARIA RIVERA             18
GREGORY WILLIAMS         18
SANDRA CRUZ              18
SONIA LOPEZ              17
SERGIO LOPEZ             17
DANIEL GONZALEZ          17
ARMANDO GONZALEZ         17
VICTOR RODRIGUEZ         17
YOLANDA SANCHEZ          17
DAVID HERNANDEZ          17
MARIA GOMEZ              17
LORENA GONZALEZ          17
ELIZABETH SANCHEZ        17
MICHAEL JONES            16
MARGARITA LOPEZ          16
Name: employee_name, dtype: int64

In [19]:
len(df.employee_name.unique())


Out[19]:
279280

In [20]:
len(df)


Out[20]:
507614

In [32]:
df.groupby('employee_name')


Out[32]:
<pandas.core.groupby.DataFrameGroupBy object at 0x117855400>

In [30]:
df[df.employee_name == 'CLARISSA RAMOS']


Out[30]:
agency base_pay benefits employee_name job_title jurisdiction_name notes other_pay overtime_pay status total_benefits total_pay total_pay_benefits year
84752 NaN 25566.0 NaN CLARISSA RAMOS ELIGIBILITY WORKER II Los Angeles County None 2351.0 51.0 NaN 18841.0 27968.0 46809.0 2013
68207 Los Angeles County 41906.0 27278.0 CLARISSA RAMOS ELIGIBILITY WORKER II NaN None 2747.0 468.0 None NaN 45121.0 72400.0 2015

In [ ]: